Performance of PRISM III and PIM 2 scores in a cancer pediatric intensive care unit

Objective To assess the performance of Pediatric Risk of Mortality (PRISM) III and Pediatric Index of Mortality (PIM) 2 scores in the pediatric intensive care unit. Methods A retrospective cohort study. Data were retrospectively collected from medical records of all patients admitted to the pediatric intensive care unit of a cancer hospital from January 2017 to June 2018. Results The mean PRISM III score was 15, and PIM 2, 24%. From the 338 studied patients, 62 (18.34%) died. The PRISM III estimated mortality was 79.52 patients (23.52%) and for PIM 2 80.19 patients (23.72%), corresponding to a standardized mortality ratio (95% confidence interval: 0.78 for PRISM II and 0.77 for PIM 2). The Hosmer-Lemeshow chi-square test was 11.56, 8df, 0.975 for PRISM II and 0.48, 8df, p = 0.999 for PIM 2. The area under the Receiver Operating Characteristic curve was 0.71 for PRISM III and 0.76 for PIM 2. Conclusion Both scores overestimated mortality and have shown a regular ability to discriminate between survivors and non-survivors. Models should be developed to quantify the severity of cancer pediatric patients in Pediatric Intensive Care Units and to predict the mortality risk accounting for their peculiarities.


INTRODUCTION
Score systems are used to provide benchmarks recognizable by different observers. They are used to indicate the severity and assess the mortality risk in the intensive care unit (ICU). These systems help identify and solve problems and aim to measure the severity of the disease, calibrating that data to a given outcome, such as death or survival. These results are also indicators of the quality of the service provided and useful for internal and external benchmarking. (1) Implementing these systems is highly important for prognostic precision and accuracy in cancer patients admitted to the pediatric intensive care unit (PICU), as this group of patients is characterized by high mortality rates, therefore requiring earlier and effective prediction of untoward outcomes.
Initially, this was a subjective assessment, as in the clinical rating system, where patients were clustered according to their stability and therapeutic intervention requirements. (2) In 1974, Cullen created the Therapeutic Intervention Scoring System (TISS), an indirect and objective method of analyzing the severity of the disease based on therapeutic resources and factors causing clinical worsening of the patient. This method was later reviewed by Keene, in 1983. (2,3) Scores also emerged for specific clinical conditions, such as the Glasgow coma scale and the Injury Severity Score. (4,5) Subjective quantitative scores emerged from the advance of clinical data associated with statistical tools for the determination of relevant clinical variables, allowing mathematical formulas to correlate with percentual mortality risk. (6) Examples of this type of score are the Physiologic Stability Index, which after a revision process originated the Pediatric Risk of Mortality (PRISM). (7,8) The main scores developed for the pediatric population are the PRISM (9)(10)(11)(12) and the Pediatric Index of Mortality (PIM) (13)(14)(15) and their new versions, PRISM IV (11) and PIM 3. (13,14) These scores were developed by identifying relevant variables for mortality risk and scored after logistic regression statistical analysis.
For Brazil, it estimated 420,000 new cases of cancer during 2019, without considering non-melanoma skin cancer. (16,17) As the median percent of children-youth tumors in the Brazilian Cancer Registry is about 3%, it is assumed that there will be 12,500 new cases of cancer in children and adolescents (up to the age 19). (18) In the recent decades, there has been a marked increase in the overall survival of children with cancer, (17) with five or more years survival rate averaging 58% during the 1970', and currently above 80% in developed countries. (19,20) However, in developing countries (low and middle-income), the cure expectation remains around 20%. (21)(22)(23) These improvements in mortality and survival are accompanied by an increase in complications, such as respiratory and cardiovascular failure, as well as neurological problems, which may require admission to the PICU, where most supportive therapies can be provided. (24,25) The performance of severity scores in children with onco-hematological diseases, besides presenting a wide closely population-related variation in prognosis, also shows scarcity of studies. (1) These divergences raise questions about the use of these scores in pediatric oncology. Unfortunately, even today there is no mortality prediction score specifically developed for pediatric non-bone marrow transplantation cancer patients, (26) despite numerous efforts.
We should point out that during the period of data collection for this study, PRISM IV was not yet in the public domain, and the standard adopted institutionally for this assessment was based on PRISM III and PIM 2.
This study was aimed to assess the performance and internal validation of PRISM III and PIM 2 in a reference hospital in pediatric oncology.

METHODS
A retrospective cohort study was conducted. The data were retrospectively collected from the medical records of all patients admitted to the PICU of the Hospital Oncológico Infantil Octávio Lobo em Belém, Pará, in the Brazilian Amazon region, from January 2017 to June 2018.
Patients admitted to the PICU for longer than 8 hours were included. Patients staying for less than 8 hours or less of 4 hours in case of death; admitted with cardiorespiratory arrest or not achieving vital signs stability in 12 hours; in palliative care or with a do not resuscitate order; or with brain death, were excluded.
The assessed variables constituted three groups: clinical-epidemiological characterization; score system calculation, corresponding to the first 24 hours from admission for analysis of the PRISM score system; and outcome. Demographics and clinical information were included for the sample stratification.
A data bank was assembled using the Excel® 2010 software sheets. The statistical Hosmer-Lemeshow test was used for calibration of the model. (25) The analysis was conducted by dividing the patients into ten mortality risk strata, to compare observed and expected mortality. For discrimination between survivors and deaths, the area under the Receiver Operating Characteristics (ROC) curve was calculated. (26) To quantify the quality of care in the PICU using the mortality score, the standardized mortality ratio (SMR), (27) comparing estimated with observed deaths, was adopted. This

RESULTS
During the study period, there were 489 hospitalizations. However, only 338 (69.1%) were included; the 151 (30.8%) excluded cases had incomplete information or did not meet the inclusion criteria. The majority were female (50.9%), median age 8 years, standard deviation ± 5 years, ranging from 3 months to 18 years (Table 1).
Of the 338 studied patients, 62 (18.3%) died, and 38 (61.5%) of these deaths were caused by septic shock and/ or multi-organ dysfunction.
Tables 2 and 3 evaluate the similarities in the observed and expected mortality by mortality risk strata, using the Hosmer-Lemeshow goodness-of-fit test for PRISM III -in the first 24 hours, and for PIM 2 estimated from the entire sample of the original score, respectively (chi-square = 11.56; 8df; p = 0.975 for PRISM III; chisquare = 0.48; 8df; p = 0.999 for PIM 2).

DISCUSSION
Regarding the performance of the score concerning the overall population mortality through the SMR, both (PRISM III and PIM 2) overestimated it. Both scores were created some years ago and may have not considered the current population of children and adolescents with complex chronic illness, which may have influenced this difference between the observed and expected mortality. Some studies have found similar results. (27)     Evaluation of the discriminatory performance of the models using the area under the ROC curve evidenced that both PRISM III and PIM 2 have a regular ability to discriminate between survivors and non-survivors (0.71 for PRISM III and 0.76 for PIM 2). Many authors have reported that PRISM III overestimates (27) mortality and fails to have good calibration and discrimination in specific populations. (28)(29)(30) The study population had an overall mortality rate of 18.3% and, in this percentage, 61.5% were due to septic shock/multi-organ dysfunction. Other studies have shown mortality rates close to this or higher. (31) The development of potentially serious infections is probably associated with the degree of immunosuppression, resulting from both the underlying neoplastic disease and the post-chemotherapy condition. (32,33) It is also important to emphasize that during the study period, sepsis protocols and care-related infection preventive bundles had not yet been implemented. This may have contributed to this higher mortality rate.
This study has limitations. Because it was based on retrospective medical records review, a bias in collection and interpretation must be considered; and also, because it is a single-center study. Additionally, a large portion of patients (30.8%) were excluded from the study. However, as strengths, the study had a moderate sample size and is a pioneer in the region.
The literature still lacks studies evaluating the outcome of pediatric cancer patients admitted to the PICU. In cancer patient care, it is necessary to develop models to quantify the severity of the disease and to predict the mortality risk, accounting for their peculiarities. In the future, the use of these models may be useful to provide better predictions of the disease's course.

CONCLUSION
In the oncology pediatric intensive care unit, both scores overestimated the actual mortality over the predicted one. The predictive models studied have shown a regular ability to discriminate between survivors and non-survivors among patients with children and youth cancer. PIM 2 was superior to PRISM III. Therefore, these are important tools for the prognostic assessment of these patients. It is important to emphasize that this was the first study of its kind to be carried out in this specific population sample, and additional research is required for better calibration and validation of these scores in this population.